522 research outputs found

    Space-Time Complexity in Hamiltonian Dynamics

    Full text link
    New notions of the complexity function C(epsilon;t,s) and entropy function S(epsilon;t,s) are introduced to describe systems with nonzero or zero Lyapunov exponents or systems that exhibit strong intermittent behavior with ``flights'', trappings, weak mixing, etc. The important part of the new notions is the first appearance of epsilon-separation of initially close trajectories. The complexity function is similar to the propagator p(t0,x0;t,x) with a replacement of x by the natural lengths s of trajectories, and its introduction does not assume of the space-time independence in the process of evolution of the system. A special stress is done on the choice of variables and the replacement t by eta=ln(t), s by xi=ln(s) makes it possible to consider time-algebraic and space-algebraic complexity and some mixed cases. It is shown that for typical cases the entropy function S(epsilon;xi,eta) possesses invariants (alpha,beta) that describe the fractal dimensions of the space-time structures of trajectories. The invariants (alpha,beta) can be linked to the transport properties of the system, from one side, and to the Riemann invariants for simple waves, from the other side. This analog provides a new meaning for the transport exponent mu that can be considered as the speed of a Riemann wave in the log-phase space of the log-space-time variables. Some other applications of new notions are considered and numerical examples are presented.Comment: 27 pages, 6 figure

    A haplome alignment and reference sequence of the highly polymorphic Ciona savignyi genome

    Get PDF
    The high degree of polymorphism in the genome of the sea squirt Ciona savignyi complicated the assembly of sequence contigs, but a new alignment method results in a much improved sequence

    VARiD: A variation detection framework for color-space and letter-space platforms

    Get PDF
    Motivation: High-throughput sequencing (HTS) technologies are transforming the study of genomic variation. The various HTS technologies have different sequencing biases and error rates, and while most HTS technologies sequence the residues of the genome directly, generating base calls for each position, the Applied Biosystem's SOLiD platform generates dibase-coded (color space) sequences. While combining data from the various platforms should increase the accuracy of variation detection, to date there are only a few tools that can identify variants from color space data, and none that can analyze color space and regular (letter space) data together

    Sequence alignment, mutual information, and dissimilarity measures for constructing phylogenies

    Get PDF
    Existing sequence alignment algorithms use heuristic scoring schemes which cannot be used as objective distance metrics. Therefore one relies on measures like the p- or log-det distances, or makes explicit, and often simplistic, assumptions about sequence evolution. Information theory provides an alternative, in the form of mutual information (MI) which is, in principle, an objective and model independent similarity measure. MI can be estimated by concatenating and zipping sequences, yielding thereby the "normalized compression distance". So far this has produced promising results, but with uncontrolled errors. We describe a simple approach to get robust estimates of MI from global pairwise alignments. Using standard alignment algorithms, this gives for animal mitochondrial DNA estimates that are strikingly close to estimates obtained from the alignment free methods mentioned above. Our main result uses algorithmic (Kolmogorov) information theory, but we show that similar results can also be obtained from Shannon theory. Due to the fact that it is not additive, normalized compression distance is not an optimal metric for phylogenetics, but we propose a simple modification that overcomes the issue of additivity. We test several versions of our MI based distance measures on a large number of randomly chosen quartets and demonstrate that they all perform better than traditional measures like the Kimura or log-det (resp. paralinear) distances. Even a simplified version based on single letter Shannon entropies, which can be easily incorporated in existing software packages, gave superior results throughout the entire animal kingdom. But we see the main virtue of our approach in a more general way. For example, it can also help to judge the relative merits of different alignment algorithms, by estimating the significance of specific alignments.Comment: 19 pages + 16 pages of supplementary materia

    MISHIMA - a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Large nucleotide sequence datasets are becoming increasingly common objects of comparison. Complete bacterial genomes are reported almost everyday. This creates challenges for developing new multiple sequence alignment methods. Conventional multiple alignment methods are based on pairwise alignment and/or progressive alignment techniques. These approaches have performance problems when the number of sequences is large and when dealing with genome scale sequences.</p> <p>Results</p> <p>We present a new method of multiple sequence alignment, called MISHIMA (Method for Inferring Sequence History In terms of Multiple Alignment), that does not depend on pairwise sequence comparison. A new algorithm is used to quickly find rare oligonucleotide sequences shared by all sequences. Divide and conquer approach is then applied to break the sequences into fragments that can be aligned independently by an external alignment program. These partial alignments are assembled together to form a complete alignment of the original sequences.</p> <p>Conclusions</p> <p>MISHIMA provides improved performance compared to the commonly used multiple alignment methods. As an example, six complete genome sequences of bacteria species <it>Helicobacter pylori </it>(about 1.7 Mb each) were successfully aligned in about 6 hours using a single PC.</p

    Recurrence and algorithmic information

    Full text link
    In this paper we initiate a somewhat detailed investigation of the relationships between quantitative recurrence indicators and algorithmic complexity of orbits in weakly chaotic dynamical systems. We mainly focus on examples.Comment: 26 pages, no figure

    M-GCAT: interactively and efficiently constructing large-scale multiple genome comparison frameworks in closely related species

    Get PDF
    BACKGROUND: Due to recent advances in whole genome shotgun sequencing and assembly technologies, the financial cost of decoding an organism's DNA has been drastically reduced, resulting in a recent explosion of genomic sequencing projects. This increase in related genomic data will allow for in depth studies of evolution in closely related species through multiple whole genome comparisons. RESULTS: To facilitate such comparisons, we present an interactive multiple genome comparison and alignment tool, M-GCAT, that can efficiently construct multiple genome comparison frameworks in closely related species. M-GCAT is able to compare and identify highly conserved regions in up to 20 closely related bacterial species in minutes on a standard computer, and as many as 90 (containing 75 cloned genomes from a set of 15 published enterobacterial genomes) in an hour. M-GCAT also incorporates a novel comparative genomics data visualization interface allowing the user to globally and locally examine and inspect the conserved regions and gene annotations. CONCLUSION: M-GCAT is an interactive comparative genomics tool well suited for quickly generating multiple genome comparisons frameworks and alignments among closely related species. M-GCAT is freely available for download for academic and non-commercial use at:

    Savant Genome Browser 2: visualization and analysis for population-scale genomics

    Get PDF
    High-throughput sequencing (HTS) technologies are providing an unprecedented capacity for data generation, and there is a corresponding need for efficient data exploration and analysis capabilities. Although most existing tools for HTS data analysis are developed for either automated (e.g. genotyping) or visualization (e.g. genome browsing) purposes, such tools are most powerful when combined. For example, integration of visualization and computation allows users to iteratively refine their analyses by updating computational parameters within the visual framework in real-time. Here we introduce the second version of the Savant Genome Browser, a standalone program for visual and computational analysis of HTS data. Savant substantially improves upon its predecessor and existing tools by introducing innovative visualization modes and navigation interfaces for several genomic datatypes, and synergizing visual and automated analyses in a way that is powerful yet easy even for non-expert users. We also present a number of plugins that were developed by the Savant Community, which demonstrate the power of integrating visual and automated analyses using Savant. The Savant Genome Browser is freely available (open source) at www.savantbrowser.co

    Complexity for extended dynamical systems

    Full text link
    We consider dynamical systems for which the spatial extension plays an important role. For these systems, the notions of attractor, epsilon-entropy and topological entropy per unit time and volume have been introduced previously. In this paper we use the notion of Kolmogorov complexity to introduce, for extended dynamical systems, a notion of complexity per unit time and volume which plays the same role as the metric entropy for classical dynamical systems. We introduce this notion as an almost sure limit on orbits of the system. Moreover we prove a kind of variational principle for this complexity.Comment: 29 page
    corecore